Goto

Collaborating Authors

 Rush County



HydraRAG: Structured Cross-Source Enhanced Large Language Model Reasoning

Tan, Xingyu, Wang, Xiaoyang, Liu, Qing, Xu, Xiwei, Yuan, Xin, Zhu, Liming, Zhang, Wenjie

arXiv.org Artificial Intelligence

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge. Current hybrid RAG system retrieves evidence from both knowledge graphs (KGs) and text documents to support LLM reasoning. However, it faces challenges like handling multi-hop reasoning, multi-entity questions, multi-source verification, and effective graph utilization. To address these limitations, we present HydraRAG, a training-free framework that unifies graph topology, document semantics, and source reliability to support deep, faithful reasoning in LLMs. HydraRAG handles multi-hop and multi-entity problems through agent-driven exploration that combines structured and unstructured retrieval, increasing both diversity and precision of evidence. To tackle multi-source verification, HydraRAG uses a tri-factor cross-source verification (source trustworthiness assessment, cross-source corroboration, and entity-path alignment), to balance topic relevance with cross-modal agreement. By leveraging graph structure, HydraRAG fuses heterogeneous sources, guides efficient exploration, and prunes noise early. Comprehensive experiments on seven benchmark datasets show that HydraRAG achieves overall state-of-the-art results on all benchmarks with GPT-3.5-Turbo, outperforming the strong hybrid baseline ToG-2 by an average of 20.3% and up to 30.1%. Furthermore, HydraRAG enables smaller models (e.g., Llama-3.1-8B) to achieve reasoning performance comparable to that of GPT-4-Turbo. The source code is available on https://stevetantan.github.io/HydraRAG/.


Aligning ESG Controversy Data with International Guidelines through Semi-Automatic Ontology Construction

Iwata, Tsuyoshi, Comte, Guillaume, Flores, Melissa, Kondo, Ryoma, Hisano, Ryohei

arXiv.org Artificial Intelligence

The growing importance of environmental, social, and governance data in regulatory and investment contexts has increased the need for accurate, interpretable, and internationally aligned representations of non-financial risks, particularly those reported in unstructured news sources. However, aligning such controversy-related data with principle-based normative frameworks, such as the United Nations Global Compact or Sustainable Development Goals, presents significant challenges. These frameworks are typically expressed in abstract language, lack standardized taxonomies, and differ from the proprietary classification systems used by commercial data providers. In this paper, we present a semi-automatic method for constructing structured knowledge representations of environmental, social, and governance events reported in the news. Our approach uses lightweight ontology design, formal pattern modeling, and large language models to convert normative principles into reusable templates expressed in the Resource Description Framework. These templates are used to extract relevant information from news content and populate a structured knowledge graph that links reported incidents to specific framework principles. The result is a scalable and transparent framework for identifying and interpreting non-compliance with international sustainability guidelines.


OntView: What you See is What you Meant

Bobed, Carlos, Quintana, Carlota, Mena, Eduardo, Bobed, Jorge, Bobillo, Fernando

arXiv.org Artificial Intelligence

In the field of knowledge management and computer science, ontologies provide a structured framework for modeling domain-specific knowledge by defining concepts and their relationships. However, the lack of tools that provide effective visualization is still a significant challenge. While numerous ontology editors and viewers exist, most of them fail to graphically represent ontology structures in a meaningful and non-overwhelming way, limiting users' ability to comprehend dependencies and properties within large ontological frameworks. In this paper, we present OntView, an ontology viewer that is designed to provide users with an intuitive visual representation of ontology concepts and their formal definitions through a user-friendly interface. Building on the use of a DL reasoner, OntView follows a "What you see is what you meant" paradigm, showing the actual inferred knowledge. One key aspect for this is its ability to visualize General Concept Inclusions (GCI), a feature absent in existing visualization tools. Moreover, to avoid a possible information overload, Ontview also offers different ways to show a simplified view of the ontology by: 1) creating ontology summaries by assessing the importance of the concepts (according to different available algorithms), 2) focusing the visualization on the existing TBox elements between two given classes and 3) allowing to hide/show different branches in a dynamic way without losing the semantics. OntView has been released with an open-source license for the whole community.


A Schema-aware Logic Reformulation for Graph Reachability

Di Pierro, Davide, Ferilli, Stefano

arXiv.org Artificial Intelligence

Graph reachability is the task of understanding whether two distinct points in a graph are interconnected by arcs to which in general a semantic is attached. Reachability has plenty of applications, ranging from motion planning to routing. Improving reachability requires structural knowledge of relations so as to avoid the complexity of traditional depth-first and breadth-first strategies, implemented in logic languages. In some contexts, graphs are enriched with their schema definitions establishing domain and range for every arc. The introduction of a schema-aware formalization for guiding the search may result in a sensitive improvement by cutting out unuseful paths and prioritising those that, in principle, reach the target earlier. In this work, we propose a strategy to automatically exclude and sort certain graph paths by exploiting the higher-level conceptualization of instances. The aim is to obtain a new first-order logic reformulation of the graph reachability scenario, capable of improving the traditional algorithms in terms of time, space requirements, and number of backtracks. The experiments exhibit the expected advantages of the approach in reducing the number of backtracks during the search strategy, resulting in saving time and space as well.


Tripl\`etoile: Extraction of Knowledge from Microblogging Text

Zavarella, Vanni, Consoli, Sergio, Recupero, Diego Reforgiato, Fenu, Gianni, Angioni, Simone, Buscaldi, Davide, Dessì, Danilo, Osborne, Francesco

arXiv.org Artificial Intelligence

Numerous methods and pipelines have recently emerged for the automatic extraction of knowledge graphs from documents such as scientific publications and patents. However, adapting these methods to incorporate alternative text sources like micro-blogging posts and news has proven challenging as they struggle to model open-domain entities and relations, typically found in these sources. In this paper, we propose an enhanced information extraction pipeline tailored to the extraction of a knowledge graph comprising open-domain entities from micro-blogging posts on social media platforms. Our pipeline leverages dependency parsing and classifies entity relations in an unsupervised manner through hierarchical clustering over word embeddings. We provide a use case on extracting semantic triples from a corpus of 100 thousand tweets about digital transformation and publicly release the generated knowledge graph. On the same dataset, we conduct two experimental evaluations, showing that the system produces triples with precision over 95% and outperforms similar pipelines of around 5% in terms of precision, while generating a comparatively higher number of triples.


Microsoft Cloud-based Digitization Workflow with Rich Metadata Acquisition for Cultural Heritage Objects

Kutt, Krzysztof, Gomułka, Jakub, Miranda, Luiz do Valle, Nalepa, Grzegorz J.

arXiv.org Artificial Intelligence

In response to several cultural heritage initiatives at the Jagiellonian University, we have developed a new digitization workflow in collaboration with the Jagiellonian Library (JL). The solution is based on easy-to-access technological solutions -- Microsoft 365 cloud with MS Excel files as metadata acquisition interfaces, Office Script for validation, and MS Sharepoint for storage -- that allows metadata acquisition by domain experts (philologists, historians, philosophers, librarians, archivists, curators, etc.) regardless of their experience with information systems. The ultimate goal is to create a knowledge graph that describes the analyzed holdings, linked to general knowledge bases, as well as to other cultural heritage collections, so careful attention is paid to the high accuracy of metadata and proper links to external sources. The workflow has already been evaluated in two pilots in the DiHeLib project focused on digitizing the so-called "Berlin Collection" and in two workshops with international guests, which allowed for its refinement and confirmation of its correctness and usability for JL. As the proposed workflow does not interfere with existing systems or domain guidelines regarding digitization and basic metadata collection in a given institution (e.g., file type, image quality, use of Dublin Core/MARC-21), but extends them in order to enable rich metadata collection, not previously possible, we believe that it could be of interest to all GLAMs (galleries, libraries, archives, and museums).


A Farewell to Harms: Risk Management for Medical Devices via the Riskman Ontology & Shapes

Gorczyca, Piotr, Arndt, Dörthe, Diller, Martin, Kettmann, Pascal, Mennicke, Stephan, Strass, Hannes

arXiv.org Artificial Intelligence

We introduce the Riskman ontology & shapes for representing and analysing information about risk management for medical devices. Risk management is concerned with taking necessary precautions so a medical device does not cause harms for users or the environment. To date, risk management documentation is submitted to notified bodies (for certification) in the form of semi-structured natural language text. We propose to use classes from the Riskman ontology to logically model risk management documentation, and to use the included SHACL constraints to check for syntactic completeness and conformity to relevant standards. In particular, the ontology is modelled after ISO 14971 and the recently published VDE Spec 90025. Our proposed methodology has the potential to save many person-hours for both manufacturers (when creating risk management documentation) as well as notified bodies (when assessing submitted applications for certification), and thus offers considerable benefits for healthcare and, by extension, society as a whole.


Foundations for Digital Twins

Hurley, Regina, Maxwell, Dan, McLellan, Jon, Wilson, Finn, Beverley, John

arXiv.org Artificial Intelligence

The growing reliance on digital twins across various industries and domains brings with it semantic interoperability challenges. Ontologies are a well-known strategy for addressing such challenges, though given the complexity of the phenomenon, there are risks of reintroducing the interoperability challenges at the level of ontology representations. In the interest of avoiding such pitfalls, we introduce and defend characterizations of digital twins within the context of the Common Core Ontologies, an extension of the widely-used Basic Formal Ontology. We provide a set of definitions and design patterns relevant to the domain of digital twins, highlighted by illustrative use cases of digital twins and their physical counterparts. In doing so, we provide a foundation on which to build more sophisticated ontological content related and connected to digital twins.


On the Value of Labeled Data and Symbolic Methods for Hidden Neuron Activation Analysis

Dalal, Abhilekha, Rayan, Rushrukh, Barua, Adrita, Vasserman, Eugene Y., Sarker, Md Kamruzzaman, Hitzler, Pascal

arXiv.org Artificial Intelligence

A major challenge in Explainable AI is in correctly interpreting activations of hidden neurons: accurate interpretations would help answer the question of what a deep learning system internally detects as relevant in the input, demystifying the otherwise black-box nature of deep learning systems. The state of the art indicates that hidden node activations can, in some cases, be interpretable in a way that makes sense to humans, but systematic automated methods that would be able to hypothesize and verify interpretations of hidden neuron activations are underexplored. This is particularly the case for approaches that can both draw explanations from substantial background knowledge, and that are based on inherently explainable (symbolic) methods. In this paper, we introduce a novel model-agnostic post-hoc Explainable AI method demonstrating that it provides meaningful interpretations. Our approach is based on using a Wikipedia-derived concept hierarchy with approximately 2 million classes as background knowledge, and utilizes OWL-reasoning-based Concept Induction for explanation generation. Additionally, we explore and compare the capabilities of off-the-shelf pre-trained multimodal-based explainable methods. Our results indicate that our approach can automatically attach meaningful class expressions as explanations to individual neurons in the dense layer of a Convolutional Neural Network. Evaluation through statistical analysis and degree of concept activation in the hidden layer show that our method provides a competitive edge in both quantitative and qualitative aspects compared to prior work.